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PREDICTING  ATTRITION: 

A TEST  OF  ALTERNATIVE  APPROACHES 

Robert  F.  Lockman  and  John  T.  Warner 
Center  for  Naval  Analyses 

SLIDE  1 

We  are  going  to  describe  (1)  the  background  of  predicting 
premature  enlisted  attrition  in  the  military  service,  (2)  four 
competing  approaches  to  predicting  this  attrition,  (3)  a test 
of  these  approaches,  and  (4)  the  implications  of  the  results  for 
recruiting  policy. 

BACKGROUND 

A 

The  history  of  predicting  premature  attrition,  that  is, 
losses  before  the  completion  of  the  first-term  of  military 
service,  dates  back  at  least  to  the  early  1960s.  At  that  time, 
researchers  in  the  Navy,  Army,  and  Air  Force  found  that  the  best 
pre-service  predictors  of  premature  attrition  were,  in  order, 
level  of  education,  mental  ability,  and  age ^ (references  1,  2, 
and  3).  The  multiple  correlation  of  ^hese  three  predictors  with 
various  measures  of  attrition  was  about  .35  for  all  three  services. 

i 

In-service  measures  of  performance  and  ratings  of  behavior 
increased  the  predictability  of  attrition,  but  they  could  not  be 
used  for  screening  out  potential  recruits  who  were  high  loss 
risks.  Personality  tests  have  r.lso  beein  related  to  premature  — — - ^ 
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attrition  with  varying  degrees  of  success,  but  they  must  be 
specially  administered  to  applicants  if  they  are  to  be  used  for 
screening  purposes. 

Criticisms  have  been  made  of  these  past  studies  and  current 
ones  that  employ  personal  characteristics  and  entry  test  scores  to 
predict  premature  attrition  (reference  4).  The  low  value  of  the 
correlation  of  the  predictors  with  the  stay/attrite  criterion 
has  been  cited,  e.g.,  the  R of  .35  mentioned  earlier.  However 
this  magnitude  of  correlation  compares  favorably  with  the  validity 
coefficients  of  measures  used  to  predict  occupational  performance 
in  the  civilian  and  military  worlds  (reference  5) . It  has  been 
said  that  the  low  level  of  predictability  is  due  to  a decreasing 
diversity  of  the  AVF  manpower  pool  which  limits  the  degree  of  cor- 
relation that  can  be  achieved.  But  if  this  were  true,  the  cor- 
relation could  be  corrected  for  such  restriction  without  too  much 


effort.  The  use  of  "static"  personal  characteristics  and  entry 
test  scores  has  also  been  criticized  because  important  "dynamic" 
situational  or  organizational  variables  are  ignored.  The  desir- 
ability of  investigating  such  measures  for  in-service  classification 
and  assignment  purposes  is  evident  (we  ourselves  are  currently 
doing  this  for  the  Navy) , but  their  reliability  and  validity  for 
predicting  attrition  in  conjunction  with  the  "static"  measures  still 
must  be  demonstrated.  Finally,  it  has  also  been  said  that  the  use  of 
personal  characteristics  and  entry  test  scores  results  in  self-ful- 
filling prophecies  of  attrition  - if  men  are  thought  to  be  dumb 


and  uneducated,  they  will  be  expected  to  fail  and,  therefore,  will 
fail.  There  are  compelling  reasons  for  not  labeling  men  with  educa- 
tional levels  and  mental  groups,  but  at  the  same  time  our  society 
places  different  values  on  these  characteristics,  and  it  is  gra- 
tituitous  to  expect  the  services  to  do  otherwise. 

In  any  event,  attrition,  like  death  and  taxes,  is  always  with 
us,  and  today  it  is  with  us  more  than  !t  was  during  the  draft  era. 
The  three  to  four  years  premature  loss  rates  in  the  1960s  ran  from 
about  25  to  30  percent.  Today,  the  comparable  rates  are  30  to  40 
percent  (references  1,  2,  3,  and  6). 

SLIDE  2 

Costs  of  premature  attrition  are  up,  not  only  absolutely  but 
relatively  with  the  higher  pay  for  today's  volunteers  and  increased 
recruiting  and  training  costs.  The  Navy  estimates  that  it  costs 
$1,500  just  to  "access  and  dress"  a non-prior-sei vice  recruit; 
another  $1,500  to  get  him  (or  her)  through  8 weeks  of  recruit 
training;  another  $400  for  two  weeks  of  apprentice  training  for  those 
wo  do  not  go  to  Class  A (technical  training)  schools;  and  about 
$1,800  for  technical  training  that  averages  6 weeks  (references  7 
and  8) . 

These  stages  occur  before  a man  is  assigned  to  the  fleet  and 
becomes  a productive  member  of  the  Navy.  And  as  men  are  lost  any- 
where along  the  line,  the  toll  mounts  up.  The  costs  of  administra- 
tive and  disciplinary  discharges,  unauthorized  absences,  desertion. 
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disciplinary  measures,  medical  procedures,  and  the  burden  of  dealing 
with  u productive  losses-to-be  also  must  be  added  to  the  bill. 

In  sum,  then,  premature  losses,  even  of  the  voluntary  type 
now  undergoing  experimental  review  in  the  Navy,  are  significant 
and  expensive.  Since  personal  characteristics  and  test  scores 
are  useful  for  screening  out  loss-prone  applicants,  the  question 
is,  what  is  the  best  approach  for  doing  so? 

ALTERNATIVE  APPROACHES 

When  we  talk  about  the  ‘best"  approach  for  screening  out  loss- 
prone  applicants,  we  mean  the  most  valid  and  least  expensive,  sub- 
ject to  the  available  supply  of  manpower.  If  the  pool  or  potential 
recruits  is  so  small  that  virtually  all  applicants  have  to  be  taken 
to  meet  manpower  requirements,  then  screening  is  useful  only  for 
putting  a "watch  out"  tag  on  a man  whose  chances  of  completing  an 
initial  tour  are  dim.  If  there  is  flexibility  in  whom  we  can 
take,  screening  becomes  more  useful  in  denying  entry  to  the  poorer 
risks . 

There  are  two  bases  for  screening.  The  first  one  is  actuarial. 
With  a sufficiently  large  recruit  cohort,  actual  loss  rates  could 
be  calculated  for  men  with  different  patterns  of  characteristics. 

The  trouble  here,  even  when  data  is  available  on  hundreds  of  thousands 
of  men,  is  that  we  cannot  be  sure  which  are  the  most  important 
characteristics,  and  combinations  thereof,  that  relate  to  losses. 


Statistical  approaches  to  predicting  attrition  overcome  the 
drawbacks  of  the  actuarial  approach.  They  let  us  know  what  the 
significant  combinations  of  characteristics  are  that  relate  to 
losses  and  smooth  out  the  projected  rates. 

SLIDE  3 

There  are  two  main  but  different  statistical  approaches  that 
can  be  taken,  with  two  variants  of  each.  The  main  approaches  ere 
linear  and  non-lir.ear  in  form,  with  the  variants  being  the  use  of 
either  individual  or  grouped  observations.  ^ 

The  linear  approach  with  individual  observations  is  the  most 
common.  It  was  used  in  the  early  work  of  Flag,  Caylor,  and  Flyer 
for  the  Navy,  Army,  and  Air  Force,  respectively.  Recently,  it  has 
been  applied  by  the  Navy  Personnel  R&D  Center.  The  grouped 
linear  and  non-linear  approaches  are  ones  that  I used  recently  for 
the  Navy.  The  individual  non-linear  approach  has  been  proposed  by 
Dempsey  and  Fast  to  the  Air  Force. 

Let  us  briefly  look  at  the  main  features  of  these  approaches 
and  compare  their  pros  and  cons. 

The  linear  approach  with  individual  observations  is  the  most 
familiar  one.  Numerous  computer  programs  for  regression  analysis 
using  this  approach  are  available.  These  programs  can  easily  handle 


See  the  appendix  for  a technical  discussion  of  these  approaches. 
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very  large  samples  of  men  and  many  predictor  variables  in  a one- 
stage  analysis.  The  major  disadvantage  of  the  individual  linear 
approach  is  that  it  may  not  be  efficient,  especially  when  the  rela- 
tionship of  the  predictors  to  the  chances  of  attriting  is  not  linear. 

Whereas  the  individual  linear  approach  uses  a binary  dependent 
variable,  stay-attrite , the  grouped  approaches  use  loss  rates 
(linear)  or  the  log  of  the  odds  of  loss  rates  (non-linear)  for 
groups  of  men  defined  by  all  possible  combinations  of  the  pre- 
dictors. An  example  of  a group  is  recruits  with  12  years  of  educa- 
tion, MG  II,  age  17,  Caucasian,  and  no  dependents.  The  groups  are 
weighted  to  take  account  of  their  varying  size  in  a regression 
analysis  that  is  similar  to  the  one  performed  with  the  individual 
linear  approach. 

Both  grouped  approaches  require  redefinition  or  pooling  of 
groups  and  an  additional  regression  when  a predictor  variable  is 
found  not  be  to  significantly  related  to  the  dependent  variable. 

Both  also  require  very  large  samples  with  even  small  numbers  of 
predictors.  Because  of  the  large  number  of  possible  combinations 
of  the  predictors,  enough  men  must  be  found  in  the  groups  to  pro- 
duce reliable  loss  rates. ^ 

The  grouped  linear  approach  has  the  same  major  disadvantage 
as  its  individual  counterpart  when  the  relationships  of  predictors 
and  loss  rates  is  not  linear.  The  grouped  non-linear  avoids  this 
problem. 

In  our  case,  we  have  3 levels  of  education,  5 of  mental  group, 

3 of  age,  and  2 each  of  race  and  dependents.  The  product  of  these 
is  180,  the  number  of  possible  groups. 
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All  of  the  approaches  so  far  rely  on  ordinary  least  squares 
regressions  to  solve  their  attrition  equations,  even  the  grouped 
non-linear  approach-  The  non-linear  individual  approach  is 
estimated  by  a different  method,  maximum  likelihood.  It  can  handle 
equations  where  the  dependent  variable  is  not  a simple  linear  com- 
bination of  the  predictors  (as  can  the  grouped  r.on-linear)  . How- 
ever, in  some  cases,  it  may  be  the  most  time-consuming  approach 
computationally.  This  is  especially  true  when  large  numbers  of 
variables  and  large  samples  are  used,  because  of  the  iterative 
searching  for  the  best  fit  to  the  data. 

In  this  age  of  computers  and  ability  to  process  massive 
amounts  of  data,  the  major  question  about  the  four  approaches  just 
described  is,  dees  it  make  any  difference  which  one  is  used  with 
the  same  data  base? 

We  sought  to  answer  this  question  by  using  the  same  set  of 
predictors  for  67,000  non-prior  service  males  who  joined  the 
regular  Navy  in  calendar  1973.  The  object  was  to  predict  the 
attrition  experience  for  these  recruits  after  each  one  had  had  the 
opportunity  to  be  in  the  Navy  for  one  year. 


SLIDE  4 


PREDICTORS 


LT12ED 

- 

less  than  high  school  graduation 

*12ED 

- 

high  school  graduation 

GT12ED 

- 

more  than  high  school  gradual  i ../n 

MG  I 

- 

mental  group  AFQT  percentiles  93 

and 

above 

MGII 

- 

mental  group  AFQT  percentiles  65 

to 

92 

*MGIIIU 

- 

mental  group  AFQT  percentiles  49 

to 

64 

MGIIIL 

- 

mental  group  AFQT  percentiles  31 

to 

48 

MG  IV 

- 

mental  group  AFQT  percentiles  30 

and 

below 

AGE  17 

- 

17  years  old 

*AGE18-19 

- 

ages  18  and  19 

AGE20+ 

- 

age  20  or  older 

*CAUC 

- 

Caucasians 

NON-CAUC 

- 

Non-Caucasians 

! 

PDEPS 

- 

primary  dependents  (wife,  children) 

; 

*NDEPS 

- 

no  primary  dependents 
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The  predictors  were  all  dichotomous  or  binary  variables  used 
to  maintain  consistency  with  current  Navy  selection  procedures. 

They  are  shown  on  the  slide. 

RESULTS 

We  separated  the  CY  1973  Navy  enlisted  cohort  into  two  samples 
by  alternately  assigning  the  individuals  in  the  data  file  to 
validation  and  cross-validation  samples,  respectively.  The  2 
samples  were  virtually  identical  in  terms  of  their  character- 
istics and  average  first-year  attrition  rate,  which  was  about  17.5 
percent.  Then,  each  of  the  four  approaches  or  models  was  estimated 
with  the  validation  sample,  producing  four  fitted  equations.* 

Each  of  these  equations  contained  the  same  independent  variables 
or  predictors  previously  mentioned. 

We  then  determined  how  well  each  equation  predicted  the  attri- 
tion in  the  cross-validation  sample.  Our  procedure  for  judging 
the  "goodness  of  fit"  was  as  follows.  First,  we  used  each  fitted 
equation  to  predict  the  probability  that  each  individual  in  the 
cross-validation  sample  would  be  a "stayer"  rather  than  an  "attriter” 
(which  is  one  minus  the  individual's  predicted  attrition  prob- 
ability. ) The  Navy  calls  the  probability  of  staying  the  in- 
dividual's SCREEN  score.  SCREEN  stands  for  Success  Chances 


*The  parameter  estimates  for  the  different  models  are  shown  in 
appendix  B. 
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for  REcruits  Entering  the  Navy.  Then  we  picked  a critical  SCREEN 
cut  score,  the  score  that  separates  people  who  will  be  accepted 
from  those  who  will  be  rejected,  and  looked  at  the  pattern  of 
results . 

SLIDE  5 

We  looked  at: 

(1)  How  many  of  the  predicted  stayers  actually  stayed, 

(2)  How  many  of  the  predicted  attr iters  actually  attrited, 

(3)  How  many  of  the  predicted  stayers  actually  attrited, 
and,  finally, 

(4)  How  many  of  the  predicted  attriters  actually  stayed. 

The  sum  of  (1)  and  (2)  is  the  number  of  correct  predictions,  or 
"hits."  Those  who  were  predicted  to  stay  but  who  attrite  are 
called  "false  positives,"  and  those  who  were  predicted  to  attrite 
but  actually  stay  are  called  "false  negatives."  Note  that  the  per- 
cei  tage  of  false  positives  is  the  attrition  rate  the  services  would 
experience  if  they  only  took  applicants  with  a SCREEN  score  above 
the  cut  score. 

The  success  of  each  approach  is  judged  by  the  percentages  of 
hits,  false  positive  and  false  negative  predictions.  As  we  will 
see,  there  is  a tradeoff  in  identifying  false  positives  and  false 
negatives;  you  can  reduce  the  percentage  of  false  negative  pre- 
dictions only  by  increasing  the  percentage  of  false  positive  pre- 
dictions. The  "goodness"  of  a particualr  approach  should  be  judged 
according  to  which  percentage  you  are  attempting  to  minimize,  as 
well  as  by  the  percentage  of  hits. 
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We  looked  at  three  different  cut  scores  in  comparing  the 
alternative  approaches.  We  will  also  see  that  the  performance  of 
the  different  approaches  is  crucially  dependent  upon  the  cut 
score  chosen.  The  first  cut  score  is  80,  which  was  the  mid-point 
between  the  average  screen  score  of  the  actual  stayers  and  the 
average  score  of  the  actual  attriters.  This  score  was  chosen 
because  this  mid-point  is  conventionally  used  for  classification 
purposes.  The  second  cut  score  is  71,  which  was  chosen  because 
it  is  the  Navy's  current  cut  score.  The  third  cut  score  is  76, 
which  was  selected  because  the  Navy  is  considering  raising  the 
score  to  76.  In  our  comparisons,  individuals  with  cut  scores  of 
80  and  below,  76  and  below,  or  71  and  below,  respectively,  will 
be  labeled  attriters,  and  those  with  higher  scores  will  be  labeled 
stayers. 

SLIDE  6 

Now  let  us  look  at  specific  results.  Here  are  the  percen- 
tages of  the  sample  that  would  be  labeled  attriters  and  therefore 
rejected  under  the  alternative  approaches  and  cut  scores.  As 
you  can  see,  if  the  cut  score  is  71,  about  the  same  percentage  of 
the  cohort  would  be  labeled  attriters  and  therefore  rejected  under 
aM  four  approaches.  However,  when  the  cut  score  is  raised  to 
76  or  80,  some  differences  between  approaches  emerge.  If  cut 
scores  are  based  on  either  of  the  two  linear  models,  a higher 
percentage  of  individuals  would  be  rejected  than  when  they  are 
based  on  either  of  the  two  non-linear  models.1 

1See  figure  B-l  in  appendix  B. 


PERCENT  OF  COHORT  REJECTED  AT  VARIOUS  CUT  SCORES 
UNDER  DIFFERENT  APPROACHES 
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Let  us  now  examine  the  percentage  of  hits,  false  positives, 
and  false  negatives  obtained  with  each  approach.  Look  first  at 
the  results  for  a cut  score  of  71.  For  this  cut  score,  the  per- 
centage of  hits,  false  positives,  and  false  negatives  are  about 
the  same  for  all  four  approaches.  For  the  higher  cut  scores,  how- 
ever, the  non-linear  models  outperform  the  linear  ones  in  terms 
of  hits  and  false  negatives.  The  percentage  of  hits  is  higher  for 
the  non-linear  approaches.  The  difference  in  hits  between  the 
linear  and  non-linear  approach  is  most  pronounced  when  the  cut  j 

j 

score  is  80.  The  percentage  of  false  negatives  is  slightly  lower 
at  a cut  score  of  76,  but  considerably  at  a score  of  80.  Remember 
that  false  negatives  are  those  individuals  predicted  to  attrite 
who  actually  stay. 

Let's  now  look  at  the  false  positives.  The  percentage  of 
false  positives  is  the  attrition  rate  that  would  actually  be 
experienced.  It  is  clear  that  higher  cut  scores  lead  to  lower 

' \ 

attrition  rates.  Now,  it  does  appear  thau,  at  given  cut  scores, 
there  would  be  more  attrition  when  a screen  table  based  on  the 
non-linear  approaches  is  used.  There  is  a reason  for  the  higher 
attrition  under  the  non-linear  approaches:  they  admit  more  people 

than  the  linear  approaches,  as  we  saw  a few  moments  ago.  The 
additional  recruits  admitted  have  somewhat  higher  attrition  chances 
than  the  group  already  taken,  and  this  raises  the  attrition  rate 
of  the  selected  cohort.  However,  this  increase  in  attrition  rates 
is  small  relative  to  the  increased  percentage  of  applicants 


-17- 


PERCENTAGE  OF  CORRECT,  FALSE  NEGATIVE,  AND  FALSE  POSITIVE  PREDICTIONS 


accepted  and  the  decreased  percentage  of  false  negative  pre- 
dictions using  the  non-linear  approaches 


SLIDE  8 

Our  conclusions  are  shown  on  the  next  slide.  If  the  cut 
score  is  71,  the  score  currently  used  by  the  Navy  for  general 
recruiting  purposes,  all  four  approaches  will  admit  about  the 
same  number  of  recruits  from  any  given  cohort.  Further,  all  four 
approaches  produce  about  the  same  percentages  of  correct  pre- 
dictions ("hits"),  false  positives  (predicted  stays  who  attrite) , 
and  false  negatives  (predicted  attrites  who  stay) . At  higher  cut 
scores,  the  non-linear  approaches  are  slightly  better  than  the 
linear  ones  in  that  they  admit  more  people  from  any  given  cohort, 
while  yielding  at  least  as  high  a percentage  of  correct  predictions 
("hits")  and  a lower  percentage  of  false  negatives  (predicted 
attrites  who  stay) . The  non-linear  approaches  do,  however,  imply 
slighly  higher  actual  attrition,  since  more  people  would  be  taken 
in  using  SCREEN  tables  based  on  these  approaches. 

The  services  are  now  under  pressure  from  OSD  and  Congress  to 
reduce  first-term  attrition,  and  one  way  to  do  this  is  to  raise 
the  cut  score.  As  I mentioned  earlier,  the  Navy  is  considering 
raising  its  cut  score  from  71  to  76.  While  the  results  with  the 
alternative  approaches  at  a cut  score  of  71  were  not  very  dif- 
ferent, they  are  at  a cut  score  of  76.  Of  a cohort  of  100,000 
applicants,  about  2,000  more  would  be  screened  out  using  one 
of  the  linear  approaches  rather  than  one  of  the  non-linear 
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approaches.  Since  the  supply  of  manpower  is  limited  and  growing 
more  so  all  the  time,  the  services  do  not  want  to  reject  more 
applicants  than  is  absolutely  necessary  to  achieve  some  desired 
attrition  rate.  The  more  stringent  the  cut  score,  the  better  the 
non-linear  approaches,  since  they  do  not  unnecessarily  screen  out 
applicants  and  since  they  produce  more  hits,  and  fewer  false  nega- 
tives. 

Let  me  close  by  noting  one  thing  that  remains  to  be  done. 

This  is  to  identify  the  optimal  cut  score.  Raising  the  cut  score 
is  a way  of  reducing  first-term  attrition,  but  such  a policy  en- 
tails the  cost  of  a reduced  supply  of  acceptable  manpower.  This 
way  of  reducing  attrition  should  be  pursued  only  if  the  marginal 
costs  of  attrition  exceed  the  costs  imposect  because  end-strength 
goals  are  not  met.  Our  future  work  will  try  to  get  at  these  costs 
and  determine  the  optimal  cut  score. 
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APPENDIX  A 


ALTERNATIVE  MODELS  FOR  ESTIMATING 
ATTRITION  PROBABILITIES 


Given  the  variables  thought  to  influence  attrition,  the  goal 
is  to  estimate  the  probability  that  an  individual  will  attrite. 

Let  X = (X,,...X  ) be  the  vector  of  variables  (the  characteristics 

■J-  K 

of  the  individual,  such  as  mental  ability  and  educational  level) 
thought  to  affect  attrition.  Then,  with  n observations  on  in- 
dividuals who  have  been  in  military  service,  of  which  n^  individuals 
were  attriters  and  n ^ - n-n^  individuals  were  non-attriters,  we 
want  to  estimate  an  equation  for  the  probability  that  an  individual 
with  a given  set  of  characteristics  (X  vector)  will  attrite.  The 
estimated  equation  may  then  be  used  for  prediction  purpr.  „os.  In 
this  case,  the  dependent  variable  is  binary  and  assumes  a value  of 
1 if  the  individual  attrites  and  0 if  he  does  not.  Models  that 
incorporate  such  dependent  variables  are  called  limited  dependent 
variable  models. 

There  are  two  classes  of  limited  dependent  variable  models. 

On  posits  a linear  cumulative  distribution  function;  the 
other  posits  an  S-shaped  or  sigmoid  cumulative  distribution.  For 
the  sake  of  exposition,  we  will  refer  to  them  as  linear  and  non- 
linear models,  respectively. 

LINEAR  MODELS 

Linear  models  are  estimated  by  ordinary  least  squares  (OLS) . 

The  method  is  simply  to  estimate  the  following  regression  equation: 


ID  Yi  - e0  + + ...  + 6kxK|1  .+  H 

The  dependent  variable  in  this  regression,  , depends  upon  whether 
the  data  is  grouped  or  ungrouped. 

Individual  Linear  Probability  Model 

If  the  linear  model  is  based  on  the  individual  observations, 

the  dependent  variable  is  assigned  the  value  1 if  the  individual 
attrites  and  the  value  0 if  he  does  not.  We  call  this  the  individual 

linear  model.  This  model  was  used  by  Plag  to  estimate  attrition 
probabilities  from  the  Navy  (reference  1) . 

The  individual  inear  model  is  closely  related  to  the  linear 
discriminant  function  (LDF) , first  proposed  by  Fisher  (reference  4) 
in  1936  as  a means  for  identifying  binary  group  membership  on  the 
basis  of  a linear  combination  (X,X^  + ^X2  + ...  XRXK)  of  known 
characteristics.  It  can  be  shown  that  the  LDF  "best"  weights  to 
place  on  the  characteristics  (the  X's)  are  directly  proportional 
to  individual  linear  regression  coefficients.^-  In  our  case,  there- 
fore, the  discriminant  function  solution  to  separating  applicants 
who  belong  to  the  population  called  attrirers  from  the  applicants 
who  belong  to  the  population  called  non-attnters  would  be  based 
on  a linear  regression  on  a binary  dependent  variable. 


1See  Maddala  (reference  11) . The  factor  of  proportionality  between 
discriminant  function  weights  and  OLS  regression  coefficients  is 
the  residual  sum  of  squares  from  the  OLS  regression  divided  by 


n-2 . 


A- 2 


The  individual  linear  model  is  appealing  because  of  the  compu- 
tational ease  of  OLS  and  because  OLS  is  capable  of  handling  very 
large  sample  sizes.  On  the  other  hand,  it  has  some  shortcomings. 

The  most  frequently  cited  difficulties  are  that  (1)  the  error  term 
(e.^  in  (1)  above)  is  aot  normally  distributed,  (2)  the  error  term 
does  not  have  a constant  variance,  and  (3)  there  is  no  restriction 
to  predicting  a probability  between  C and  1,  although  a prediction 
outside  of  this  range  is  inadmissible.  The  firs',  and  third  criticisms 
are  not  so  serious, but  the  second  criticism  implies  that  even 

within  the  class  of  linear  models,  the  individual  linear  approach 

2 

is  not  a fully  efficient  estimation  procedure. 


The  first  difficulty  implies  that  t tests  f~r  significance  of 
regression  coefficients  are  not  exact  tests.  Maddala  (reference  11) 
shows  that,  despite  the  binary  form  of  the  dependent  variable  in 
the  linear  probability  model,  the  t tests  for  the  regression  co- 
efficients are  exact  tests.  The  third  cited  difficulty  is  not 
really  a problem  either.  The  services  would  always  take  individuals 
with  predicted  attrition  probabilities  less  than  zero  and  screen 
out  individuals  with  predicted  probabilities  exceeding  unity. 

With  large  samples,  predictions  outside  the  limits  of  0 and  1 
will  occur  infrequently  anyway. 

-> 

“"The  error  variance  may  be  shown  to  be 


Z 6^(1-):  B.Xi) 

and  is  a function  of  the  values  of  X.  Since  the  error  term  is  not 
constant,  the  OLS  estimates  of  the  B's  are  not  the  most  efficient, 
i.e.,  minimum  variance,  linear  estimates. 
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An  alternative  to  the  linear  probability  model  based  on  the 

individual  observations  is  the  grouped  linear  probability  model. 

In  this  model,  the  individual  observations  are  grouped  into 

cells  on  the  basis  of  combinations  of  the  X's,  and  the  dependent 

ai 

variable  is  the  proportion  Pj  = — of  the  n.  individuals  m the  ith 

r*i  1 

cell  who  were  attriters.  is  an  estimate  of  the  true  probability 

P that  individuals  with  a given  set  of  characteristics  will 
attrite.  The  total  number  of  cells  is  the  product,  over  the  number 
of  variables,  of  the  number  of  intervals  for  each  variable.  Thus, 
if  there  are  3 education  categories  (e.g.,  <12  years,  12  years, 

>12  years),  5 mental  categories  (I,  II,  IIIU,  IITL,  IV  and  V),  3 
age  categories  (<18,  18-19,  >19),  and  2 race  groups  (Caucasians  and 

A 

non-Caucasians),  there  would  be  90  cells.  To  estimate  the  B's,  P^ 
is  regressed  on  categorical,  or  binary, variables  representing  the 
different  levels  of  each  independent  variable. 

In  cells  which  contain  small  numbers  of  observations,  P,  may 

A ^ 

not  be  a good  estimator  of  the  true  probability  . The  variance 
of  P^  is  P^(l-P^)/n^  and  is  inversely  related  to  n^,  the  number  of 

A 


observations  in  the  cell.  Since  P^  does  not  have  constant  variance, 
neither  does  the  error  term  in  the  regression,  and  the  regression 
estimates  of  the  B's  are  not  minimum  variance  estimates.  This 


In  cells  which  contain  more  observations,  P is  a lower  variance 
estimate  of  the  true  attrition  probability;  hence,  in  the  regres- 
sion more  weight  is  given  to  those  cells  which  contain  the  largest 
numbers  of  observations. 

Even  if  individual  linear  and  grouped  linear  approaches  were 

fully  efficient  linear  estimation  procedures,  they  have  a potential 

k „ 

shortcoming.  A plot  of  P.  * E 0.X. .,  where  the  S.'s  are  the 

j=l  3 3 3 

estimated  coefficients,  yields  a straight  line,  because  the  linear 

probability  models  have  linear  cumulative  distribution  functions. 

However,  studies  have  found  that  the  plot  of  the  actual  P's  (the 

k 

cell  proportions  in  the  grouped  linear  model)  against  E S-X. . 

j_l  J 13 

frequently  takes  the  form  of  an  S-shaped  curve,  or  sigmoid  (reference 
12).  If  the  cumulative  distribution  is  S-shaped  rather  than  linear, 
the  linear  probability  models  amy  provide  poor  fits  to  the  data. 
Models  which  imply  S-shaped  cumulative  distributions,  in  which  the 
probability  of  attriting  is  not  a simple  linear  function  of  its 
predictors,  may  provide  more  accurate  fits  to  the  data. 

NON-LINEAR  MODELS 

Probability  distributions  which  have  S-shaped  cumulative 

distributions  can  be  employed  to  estimate  the  0's.  The  two  most 

I 

common  ones  are  the  logistic  and  normal  distributions.  In  each 

\ 

of  these  distributions,  the  random  variable  Z is  assumed  to  be  a 

linear  function  of  X1  ...  X.  , that  is,  Z = E 0.X.  (where  Xn  = 1). 

1 K j = 0 3 3 u 

! 
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Endividual  Logistic  Distribution 

Since  the  logistic  distribution  has  the  form  P = 
the  function  to  be  estimated  is  given  in  (2) . 


exp{-{B0+g1X1  + ...  + 8kXk)} 
l+exp{- (Bq+B1X1+- • •+8kXk) ^ 


l+exp{ Bq+B^Xi+. . *+3KxK} 


Equation  (2)  is  a non-linear  equation  which  may  be  estimated 
by  the  method  of  maximum  likelihood  (ML) . To  estimate  (2) , the 
likelihood  function  L is  formed,  and  that  set  of  B's  which  maximizes 
the  value  of  L is  found.  Since  individual  observations  are  used, 
this  model  is  called  the  individual  logistic  model*  The  likeli- 
hood function  is: 


L ” Y*=1  l+expdB^}  y.ILo 


exp{E0j^Xi} 


Since  (3)  is  not  a simple  linear  expression,  the  B's  have  to  be 
estimated  using  non-linear  techniques. 

The  other  most  frequently  assumed  probability  distribution  in 
maximum  likelihood  is  a normal  distribution  with  unit  variance. 

In  this  case,  the  attrition  probability  is  given  in  (4) : 


-EPiXi 


~o — expt-^t^dt 


The  likelihood  function  for  the  normal  distribution  is  the  following; 


L = IT 


V1 


v° 


(5) 


Again,  we  find  the  B's  that  maximize  L,  and  this  has  to  be  done 
using  iterative  methods.  This  model  is  called  the  probit  model. 

Since  the  probit  model  is  based  on  a normal  distribution  with 
unit  variance,  the  parameters  *[3^...^  are  all  scaled  by  a factor 
1/a,  where  a is  the  unknown  standard  deviation,  a is  not  separately 
estimable,  and  it  is  arbitrarily  assumed  to  be  unity.  The  probit 
model  was  used  by  Dempsey  and  Fast  {reference  3)  to  estimate  attri- 
tion probabilities  from  the  Air  Force  Academy. 

While  the  logit  and  probit  models  look  different,  their  cumula- 
tive distributions  are  very  similar.  Suppose  that  Z^  is  a random 
variable  distributed  normally  with  unit  variance  and  Z2  is  a random 

variable  distributed  logistically . It  may  be  shown  that  Z9  has 
2 

variance  . Further,  it  may  be  shown  that  Z-  divided  by  its 

3 ^ 

standard  deviation,  , is  distributed  approximately  normally  with 
unit  variance.  Therefore,  Z.,  = need  only  be  multiplied  by 

j 3 3 

/J/n  to  be  comparable  to  Z1  = ^£i  obtained  from  the  probit 

model.  The  estimates  differ  on?y  by  the  scale  factor  • There- 

n 

fore,  ML  logit  is  virtually  identical  to  ML  probit  (and  vice 
versa) . 

Grouped  Logistic  Model 

With  large  amounts  of  data,  the  3's  in  (2)  can  be  estimated 
using  linear  regression.  The  probability  function  in  (2)  can  be 
transformed  into  the  following  log-linear  equation,  which  may  be 
estimated  with  OLS: 


A- 7 


(6) 


ln(i?p  ) = 60  + 61X1  + 62x2  “•  + BKXk 

The  dependent  variable  here  is  the  logarithm  of  the  odds  of  being 
an  attriter,  estimated  by  grouping  the  data  into  cells,  just  as 

A A 

in  the  grouped  linear,  and  then  using  ln(P^/l-P^)  rather  than 

/v 

as  the  dependent  variable  in  the  regression.  The  grouped 
linear  regression  procedure  was  utilized  by  Lockman  (reference  2) 
to  estimate  attrition  probabilities  froi  the  Navy. 

The  error  term  in  the  grouped  logit  regression  is  non-constant 
and  has  the  variance  n-p~rr-P.T  * Therefore'  weighting  by  the 
inverse  of  its  estimated  standard  deviation,  /n^P^ (l-§^) , yields 
a model  with  a constant  variance  error  term.  Again,  this  procedure 
places  the  largest  weights  on  those  cells  containing  the  largest 
number  of  observations. 
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APPENDIX  B 


THE  EMPIRICAL  EQUATiuNS  OBTAINED  WITH  ALTERNATIVE  APPROACHES 

Table  B-l  contains  the  parameter  estimates  obtained  by 

applying  the  alternative  statistical  procedures  to  the  data.  The 

parameter  estimates  in  the  first  column  labeled  individual  logit 

were  obtained  by  the  method  of  maximum  likelihood.  The  parameter 

estimates  in  the  other  three  columns  were  obtained  by  the  method  of 

ordinary  least  squares.  The  numbers  in  the  first  two  columns  are 

estimates  of  the  B's  in  the  logit  probability  function  P = L-, Y-:-  . 

1+e 

The  numbers  in  the  last  two  columns  may  be  interpreted  as 
estimates  of  the  B's  in  the  linear  probability  function  P = ZBjXj. 

The  "t"  values  for  the  different  variables  are  in  parentheses. 

(The  "t"  values  for  the  individual  logit  parameter  estimates  are 
asymtotic  "t"  values  - see  Zedlewski  (reference  12)).  A "t"  value 
of  1.96  or  greater  indicates  that  the  coefficient  is  significantly 
different  from  zero  at  the  .05  level;  a "t"  value  of  2.58  or  greater 
indicates  significance  at  the  .01  level. 


ESTIMATES  OF  PARAMETER.  VALUES 


Figure  B-l:  Cumulative  Predicted  Score  Distributions  for  the  Four  Approaches 
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